[% setvar title Regex: Make /$/ equivalent to /\z/ under the '/s' modifier %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p>Regex: Make /$/ equivalent to /\z/ under the '/s' modifier</p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Bart Lateur &lt;<a href='mailto:bart.lateur@skynet.be'>bart.lateur@skynet.be</a>&gt;
  Date: 28 Sep 2000
  Last Modified: 1 Oct 2000
  Mailing List: <a href='mailto:perl6-language-regex@perl.org'>perl6-language-regex@perl.org</a>
  Number: 332
  Version: 2
  Status: Frozen</pre>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>To most Perlers, /$/ in a regex simply means &quot;end of string&quot;. This is
only right, if you're absolutely sure your string doesn't end in a
newline, as is commonly the case in a large part of all textual data:
ordinary strings don't contain newlines. Lines coming from text files
can generally only contain a newline as the very last character. The
'/s' modifier is usually only used in combination with the former class
of textual data.</p>
<p>However, this situation is basically a bug hole.</p>
<p>This RFC proposes to change the '/s' modifier so that under '/s', /$/
will only match at the very end of a string, and not also before a
newline at the end of the string.</p>
<a name='CHANGES'></a><h1>CHANGES</h1>
<ul>
<li><a name='Added section about alternatives, such as the addition of the /$$/ syntax'></a>Added section about alternatives, such as the addition of the /$$/
syntax</li>
<li><a name='Added section about /z/ and /\Z/'></a>Added section about /z/ and /\Z/</li>
<li><a name='Expanded section on '/ms'.'></a>Expanded section on '/ms'.</li>
<li><a name='Complete rewrite of &quot;MIGRATION&quot; section'></a>Complete rewrite of &quot;MIGRATION&quot; section</li>
</ul>
<a name='NOTES ON FREEZING'></a><h1>NOTES ON FREEZING</h1>
<p>Since nobody had any more remarks on the draft, and the deadline is
nigh, I'm freezing it as is.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>To most Perl programmers, /^foo$/ is a regex that can only match the
string &quot;foo&quot;. It's not, actually: it can match &quot;foo\n&quot;, too. This
assumption is usually safe, because people know the kind of data they're
dealing with, and they &quot;know&quot; that it won't ever end in a newline.</p>
<p>However, this basically is a chance for bugs to creep in, if for some
reason this assumption about the data no longer holds.</p>
<p>To make matters worse, Perl doesn't even have a mechanism to prevent the
regex engine from matching /$/ at just before the last character if it's
a newline.</p>
<p>Originally, we had thought of adding Yet Another Regex Modifier; but to
be honest, having 2 modifiers just for the newline is already confusing
enough, for too many people. A third is definitely out.</p>
<p>Therefore, the proposal is instead to modify the behaviour of the '/s'
modifier.</p>
<p>Under '/s':</p>
<ul>
<li><a name='/./ can match any character, including newline;'></a>/./ can match any character, including newline;</li>
<li><a name='/$/ can match only at the very end of the string, not also in front of a last character, if it happens to be a newline.'></a>/$/ can match only at the very end of the string, not also in front of a
last character, if it happens to be a newline.</li>
</ul>
<p>This seems simple enough.</p>
<a name='CONSIDERATIONS'></a><h1>CONSIDERATIONS</h1>
<a name='Mnemonic value of '/s''></a><h2>Mnemonic value of '/s'</h2>
<p>'/s' originally stood for &quot;single line&quot;. This can no longer be true, the
mnemonic value of the &quot;s&quot; is thereby reduced to zero.</p>
<p>However, the mnemonic value wasn't that great to begin with, especially
if you consider that combining '/s' and '/m' is not only possible, but a
useful option, too. How can a string both be a single line and
multiline, at the same time?</p>
<p>So, to most Perl programmers, '/s' simply stands for</p>
<ul>
<li><a name='let /./ match a newline too'></a>let /./ match a newline too</li>
</ul>
<p>which now gets turned into:</p>
<ul>
<li><a name='treat &quot;\n&quot; as an ordinary character'></a>treat &quot;\n&quot; as an ordinary character</li>
</ul>
<p>The change isn't that big, so it is just as easy to remember. Or not.</p>
<a name='The $* variable'></a><h2>The $* variable</h2>
<p>'/s' and '/m' also have a lesser known side effect: they both override
the setting of the $* special variable, which controls multiline related
behaviour in regexes.</p>
<p>Use of this special variable has already been deprecated at least since
Perl5 first came out, more than 5 years ago. It is a very good candidate
to be removed from Perl6 altogether, which would result in fewer
gotcha's in the language. That is a Good Thing.</p>
<p>Perlvar says:</p>
<pre>    Use of `$*' is deprecated in modern Perl, supplanted by the `/s'
    and `/m' modifiers on pattern matching.</pre>
<p>Therefore, any changing behaviour of '/s', with regards to $*, can
nowadays hardly be considered relevant, any more.</p>
<p>See also <i><a href='#RFC 347: Remove long-deprecated $*'>&quot;RFC 347: Remove long-deprecated $*&quot;</a></i></p>
<a name=''/ms': combined '/m' and '/s''></a><h2>'/ms': combined '/m' and '/s'</h2>
<p>The '/m' option makes /$/ match either at the end of the string, or
before <i>any</i> newline. Adding the '/s' modifier won't change that. As a
result, '/ms' still works as before. Internally, '/m' has taken over the
job of matching before a newline at the very end of the string, simply
because /$/m can match before <i>every</i> newline.</p>
<a name='/\z/ and /\Z/'></a><h2>/\z/ and /\Z/</h2>
<p>The behaviour of /\z/ and /&lt;Z/ will remain unaltered, under all
cicumstances.</p>
<p>Currently, /$/ is a synonym for /\Z/, even under '/s'. With the modified
'/s', /\Z/ will retain its old meaning, thereby these will no longer be
synonyms under all circumstances.</p>
<a name='MIGRATION'></a><h1>MIGRATION</h1>
<p>The Perl5 To Perl6 converter can replace every occurence of</p>
<pre>    /$/s</pre>
<p>with</p>
<pre>    /\Z/s</pre>
<p>However, it's not unlikely that currently having /$/s in your regexes,
is actually a bug in your script, but you don't care because the data
won't ever make it visible. A warning if /$/ is found in combination
with a bare '/s' modifier, not combined with '/m', in addition to
replacing it with /\Z/, might be a nice idea.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>Under '/s', make '$' behave as /\z/ does now.</p>
<a name='ALTERNATIVES'></a><h1>ALTERNATIVES</h1>
<p>Some people, in particular Hugo, would rather add a special syntax for
this special case, and leave /$/ under '/s' alone. This seems not to win
the favour of too many other people (actually, the sample is too small
to get reliable statistics), for these reasons:</p>
<ul>
<li><a name='$$ already has a meaning: it is the special variable containing the PID.'></a><code>$$</code> already has a meaning: it is the special variable containing the
PID.</li>
<li><a name='We already have an alternative: /\z/. We don't want to just add Yet Another Alternative. We want /$/ to do the Right Thing.'></a>We already <i>have</i> an alternative: /\z/. We don't want to just add Yet
Another Alternative. We want /$/ to do the Right Thing.</li>
</ul>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>perlre, about '/s' and '/m'</p>
<p>perlvar, section about $*</p>
<p>RFC 347: Remove long-deprecated $* (aka $MULTILINE_MATCHING)</p>
</div>
